Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge
نویسندگان
چکیده
Previous researches on automatic information extraction experienced difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by XML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملAutomated Information Extraction using Amorphic
The Amorphic system is an adaptive web information extraction scheme for building intelligent systems for mining information from web pages. It can locate data of interest based on domain-knowledge or page structure, can automatically generate a wrapper for an information source, and can detect when the structure of a web-based resource has changed and act on this knowledge to search the update...
متن کاملJournal of International Scientific Publications
In recent years, several approaches have been proposed to extract information from web pages on the internet. In this research, a key technique focused on crawling and ontology used to discover knowledge from web. In this paper, we present intelligent crawling system that uses pattern and ontology to extract particular information from WEB sites. The system developed as an efficient tool to con...
متن کاملAutomatic Rule Retrieval from Websites Using Ontologyand Text Mining
A Rule-based system like an intelligent service comparing portal may compare product prices, shipping options, refund options etc., Such rule based system requires an automatic knowledge acquisition procedure from the Web that consists of unstructured texts. Knowledge acquisition can be carried out by ontology acquisition and rule acquisition. Obtaining information such as product prices from w...
متن کامل